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Abstract 

Flood quantile estimation is of great importance for many engineering studies and policy deci- 
sions. However, practitioners must often deal with small data available. Thus, the information 
must be used optimally. In the last decades, to reduce the waste of data, inferential method- 
ology has evolved from annual maxima modeling to peaks over a threshold one. To mitigate 
the lack of data, peaks over a threshold are sometimes combined with additional information 
- mostly regional and historical information. However, whatever the extra information is, the 
most precious information for the practitioner is found at the target site. In this study, a model 
that allows inferences on the whole time series is introduced. In particular, the proposed model 
takes into account the dependence between successive extreme observations using an appropri- 
ate extremal dependence structure. Results show that this model leads to more accurate flood 
peak quantile estimates than conventional estimators. In addition, as the time dependence is 
taken into account, inferences on other flood characteristics can be performed. An illustration 
is given on flood duration. Our analysis shows that the accuracy of the proposed models to 
estimate the flood duration is related to specific catchment characteristics. Some suggestions 
to increase the flood duration predictions are introduced. 



28 1 Introduction 



Estimation of extreme flood events is an important stage for many engineering designs and risk 
management. This is a considerable task as the amount of data available is often small. Thus, 
to increase the precision and the quality of the estimates, several au t hors use extra information 
in ad ditio n to the target site one . For example , 



Ribatet et al 



20071 and 



Cunderlik and Ouarda 



Werrittv et al 



3 



200611 and 



2007a 



Kieldsen and Jones 



20061 . 



200611 add informa t ion fr om other homogeneous gaging stations. 



Reis Jr. and Stedingei 



2005l | use historical information to improve 



inferences. Incorporation of extra information in the estimation procedure is attractive but it 



Ribatet et al. 



2007a]. Before looking at 



should not be more prominent than the original data 
other kinds of information, it seems reasonable to use efficiently the one available at the target site. 
Most often, practitioners have initially the whole time series, not only the extreme observations. In 
particular, it is a considerable waste of information to reduce a time series to a sample of Annual 
Maxima (AM). 

In this perspective, the Peaks Over Threshold (POT) approach is less wasteful as more than one 
event per year could be inferred. However, the declustering method used to identify independent 
events is quite subjective. Furthermo re, even though a "quasi automatic" procedure was recently 



introduced by 
are used. 

Coles et al. 



Ferro and Segers 



2003| , there is still a waste of information as only cluster maxima 



199J and 



Smith et al 



1997l | propose an approach using Markov chain models 
that uses all exceedances and accounts for temporal dependence between successive observations. 
Finally, the entire informatio n available within the time series is taken into account. More recently. 



Fawcett and Walshaw 



2006l | give an illustrative application of the Markov chain model to extreme 
wind speed modeling. 

In this study, extreme flood events are of interest. The performance of the Markov chain model 
is compared to the conventional POT approach. The data analyzed consist of a collection of 50 
French gaging stations. The area under study ranges from 2°W to 7°E and from 45°N to 51°N. The 
drainage areas vary from 72 to 38300 km^ with a median value of 792 km^. Daily observations were 
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recorded from 39 to 105 years, with a mean value of 60 years. For the remainder of this article, the 
quantile benchmark values are derived from the maximum likelihood estimates on the whole times 
series using a conventional POT analysis. 

The paper is organized as follows. Section [5] introduces the theoretical aspects for the Markov 
chain model, while Section [3] checks the relevance of the Markovian model hypothesis. Section d] 
and [5] analyze the performance of the Markovian model to estimate the flood peaks and durations 
respectively. Finally, some conclusions and perspectives are drawn in Section [S) 

2 A Markov Chain Model for Cluster Exceedances 

In this section, the extremal Markov chain model is presented. In the remainder of this article, it 
is assumed that the flow Yt at time t depends on the value Yt-i at time t — 1. The dependence 
between two consecutive observations is modeled by a first order Markov chain. Before introducing 
the theoretical aspects of the model, it is worth justifying and describing the main advantages of 
the proposed approach. 

It is now well-known that the univariate Extreme Value Theory (EVT) is relevant when mod- 
eling either AM or POT. Nevertheless, its extension to the multivariate case is surprisingly rarely 
applied in practice. This work aims to motivate the use of the Multivariate EVT (MEVT). In 
our application, the multivariate results are used to model the dependence between a set of lagged 
values in a times series. Consequently, compared to the AM or the POT approaches, the amount 
of observations used in the inference procedure is clearly larger. For instance, while only cluster 
maxima are used in a POT analysis, all exceedances are inferred using a Markovian model. 

2.1 Likelihood function 

Let Yi, . . . ,Yn be a stationary first-order Markov chain with a joint distribution function of two 
consecutive observations j/2), and F{y) its marginal distribution. Thus, the likelihood function 
L evaluated at points (j/i, . . . , ?/„) is: 



o = /(yi)n/(y 



nr=2 
nr=2V(y.) 



(1) 



i=2 lli=2 

79 where /(t/i) is the marginal density, f{yi\yi-i) is the conditional density, and /(j/i, ^i-i) is the joint 

80 density of two consecutive observations. 

81 To model all exceedances above a sufficiently large threshold m, the joint and marginal densities 



Coles 



200 ll | justify the use of a Generalized 



82 must be known. Standard univariate EVT arguments 

83 Pareto Distribution (GPD) for f{yi) - e.g. a term of the denominator in equation ([1]). As a 

84 consequence, the marginal distribution is defined by: 



F(y) = l-A 1+e 



a 



y>u 



(2) 



85 where = max(0,a;), A = Pr[y > uV a an d ^ are the scale and shape parameters respectively. 



Similarly, MEVT arguments 



Resnick 



1987l | argue for a bivariate extreme value distribution for 



87 /(yij^i-i) " 6.g. a term of the numerator in equation ll}. Thus, the joint distribution is defined 

88 by: 



-F(2/i,y2) = exp[-V"(zi,Z2)] , yi>u, y2>u 



(3) 



89 where ^ is a homogeneous function of order -1, e.g. V{nzi,nz2) — n'~^V(zi, Z2), satisfying 

90 V{zi, 00) = Zi^ and V{oo, Z2) = Z2^ , and Zi = —1/ logi^(yi), « = 1, 2. 

91 Contrary to the univariate case, there is no finite parametrizatio n for the V func tions. Thus, it is 

92 commo n to use spec ific parametric families for V such as th e logistic 



logist ic 



Tawn 



19881 1 , the negative logistic 



Galambos 



Gumbel 



1960l | , the asymmetric 



1975( 1 or the asymmetric negative logistic [Joe, 



1990| models. Some details for these parametrisations are report ed in Annex XI T hese models, as 



95 all models of the form ([3|) are asymptotically dependent, that is 



Coles et al. 



1999 1 
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X=limx(t^) =liin^^iPr[F(Y2) > uj\F{Yi) > oj] >0 



X = lim x{^) 



1* 21og(l— u;) -I 



(4) 
(5) 



96 Other para metric families ex i st to consider simultaneously asymptotically dependent and inde- 

97 pendent cases 



Bortot and Tawn 



1998|. However, apart from a few particular cases (see Section 



98 the data analyzed here seem to belong to the asymptotically dependent class. Consequently, in this 

99 work, only asymptotically dependent models are considered - i.e. of the form 

100 2.2 Inference 

101 The Markov chain model is fitted using maximum censored likelihood estimation [ Ledford and 



Tawn, 



1996|. The contribution (2/1,2/2) of a point (2/1,1/2) to the numerator of equation ((T|) is 



given by: 



in (2/1,^2) = < 



exp [-V{zi,Z2)] [Vi{zi,Z2)V2{zi,Z2) - 1^12(21,2:2)] K1K2, if yi > u,y2 > u 

exp [-y(zi,Z2)] Vi{zi,Z2)Ki, if 2/1 > u,2/2 < " 

exp [-y(zi,Z2)] 12(21,22)^^^2, if 2/1 <u,y2>u 

exp[~y(zi,Z2)] , if 2/1 < 2/2 < w 



(6) 



whereXj = — AjO-^-^i^^^z| exp(l/zj), tj ~ [l+^(yj— u)/(t]^"^^^ and V,-, V12 are the partial derivative 
with respect to the component j and the mixed partial derivative respectively. The contribution 
LdiVj) of a point yj to the denominator of equation ([T]) is given by: 



a-^X [1 + - , if 2/j > 



(7) 



1-A, 



otherwise. 
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Finally, the log-likelihood is given by: 



n-l 



logi(j/i, • . -^Vn) = ^ login (y,_i,?;0 - ^\ogLd{y,) 



(8) 



108 2.3 Return levels 

109 Most often, the major issue of an extreme value analysis is the quantile estimation. As for the POT 
no approach, return level estimates can be computed. However, as all exceedances are inferred, this 

111 is done in a different way as the dependence between successive observations must be taken into 

112 account. Fo r a stationarv sequence Yi . 1? Yn with a marginal distribution function F. Lindgren 



and Rootzen 



19871 1 have shown that: 



Pr [max{ri, ■■■,Yr,}<y]^ F{yY 



(9) 



whe re 6 g [0, 11 i s the extremal index and can be interpreted as the reciprocal of the mean cluster 



115 size [Leadbetteij . Il983l | - i.e. 6 = 0.5 means that extreme (enough) events are expected to occur by 

116 pair. 6 — \ (resp. 6 ^ Q) corresponds to the independent (resp. perfect dependent) case. 

117 As a consequence, the quantile Qt corresponding to the T-year return period is obtained by 

118 equating equation © to 1 — l/T and solving for T. By definition, Qt is the observation that is 

119 expected to be exceeded once every T years, i.e. 



Qt = u- aC' (l [l - (1 - 1/T)VM)] } 



(10) 



120 It is worth emphasizing equation ^ as it has a large impact on both theoretical and practical 

121 aspects. Indeed, for the AM approach, equation ^ is replaced by 



Pr [max{yi, ^2, ■■■,Yn}<y]^ G{y) 



(11) 



122 where G is the distribution function of the random variable Af„ = maxjYi, Y2, . . . ,Yn}, that is a 

123 generalized extreme value distribution. In particular, the equations ([9]) and (fTTj) differ as the first 
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124 one is fitted to the whole observations Y^, while the latter is fitted to the AM ones. By definition, the 

125 number ny of the Yi observations is much larger than the size um of the AM data set. Especially, 

126 for daily data, ny = 365?!^/. 

127 [Figure 1 about here.] 

128 From equation ([TO]) , the extremal index 6 must be known to o btain quantile estimate s . Th e 



129 methodology applied in this study is similar to the one suggested by 



Fawcett and Walshaw 



2006 1 



Once the Markovian model is fitted, 100 Markov chains of length 2000 were generated. For e ach 



Ferro and Segers 



20031 ] to 



chain, the extremal index is estimated using the estimator proposed by 
avoid issues related to the choice of declustering parameter. In particular, the extremal index 9 is 
estimated using the following equations: 



e{u) 



max 1, 



(JV-l)E""l'(T,-l)(T.-2) 



if max {T, : 1 <i < N -1} <2 



otherwise 



(12) 



where N is the number of observations exceeding the threshold u, Ti is the inter-exceedance time, 
e.g. Ti — Si-^-l ~ Si and the Si is the i-th exceedance time. 

Lastly, the extremal index related to a fitted Markov chain model is estimated using the sample 
mean of the 100 extremal index estimations. Figure[l]represents the histogram of these 100 extremal 
index estimations. In this study, as lots of time series are involved, the number and length of the 
simulated Markov chains may be too small to lead to the most accurate extremal index estimations; 
but avoid intractable CPU times. If less sites are considered, it is preferable to increase these two 
values. 

A preliminary study (not shown here) demonstrates that, for quantile e stima tion, this procedure 



was more accurate th an estimating 9 using the estimator of 



144 conclusions drawn by 



Fawcett 



Leadbetter 



1983^ . This confirms the 



20051 ] for the extreme wind speed data. 
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3 Extreme Value Dependence Structure Assessment 



146 Prior to performing any estimations, it is necessary to test whether: (a) the first order Markov 

147 chain assumption and (b) the extreme value dependence structure (equation ([3])) are appropriate 

148 to model successive observations above the threshold u. 

149 [Figure 2 about here.] 

150 [Figure 3 about here.] 

151 Figures [2] and [3] plot the auto-correlation functions and the scatter plots between two consecutive 

152 observations for two different gaging stations. As the partial autocorrelation coefficient at lag 1 

153 is large, Figure [2] and [3] (left panels) corroborate the (a) hypothesis. However, as some partial 

154 auto-correlation coefficients are significant beyond lag f , it may suggest that a higher-order model 

155 may be more appropri ate but does not necessa rily mean that a first-order assumption is completely 

156 flawed. Simplex plots 



Coles and Tawn 



199X1 ] (not shown) can be used to assess the suitability of 
a second-order assumption over a first-order one. For our application, it seems that a first-order 
model seems to be valid - except for the five slowest dynamic catchments. 

Thou gh it is an important stage because of its consequences on guantile estimates [ Ledford 



and Tawn, 



1996 



Bortot and Coles 



3 



20001 ] ■ verifying the (b) hypothesis is a considerable task. An 



overwhelming dependence between consecutive observations at finite levels is not sufficient as it 
does not give any information about the dependence relation at asymptotic levels. For instance, 
the overwhelming dependence at lag 1 (Figure [5] and [31 right panels) does certainly not justify the 
use of an asymptotic dependent model. 

[Figure 4 about here.] 

[Figure 5 about here.] 

Figures m and [5] plot the evolution of the x(cj) and x{^) statistics as lo increases for two different 
sites. For these figures, the confidence intervals are derived by bootstrapping contig uous blocks 



to take into account the successive observations dependence 



Ledford and Tawn 



20031 . The x{^) 



170 and statistics seem to depiet two different asymptotic extremal dependence. From Figure |31 

171 it seems that limxi^^) ^ and limx(ti)) — 1 for lu ^ 1. On the contrary, Figure [5] advocates 

172 for Ihnxi^) = and Umx(cLi) < 1 for u 1. Consequently, Figure 5] seems to conclude for an 

173 asymptotic dependent case while Figure [5] for an asymptotic independent case. 

174 In theory, asymptotic (in)dcpendencc should not be assessed using scatterplots. However, these 

175 two different features can be deduced from Figures and [31 For Figure [H the scatterplot (Yt-i, Yt) 

176 is increasingly less spread as the observations becomes larger; while increasingly more spread for 

177 Figure [3l In other words, for the first case, the dependence seems to become stronger at larger 

178 levels while this is the contrary for the second case. 

179 [Table 1 about here.] 

180 [Figure 6 about here.] 

181 Two specific cases for different asymptotic dependence structures were illustrated. Table[2]shows 

182 the evolution of the x{uj) statistics as w increases for all the sites under study. Most of the stations 

183 have significantly positive xi^) values. In addition, only 13 sites have a 95% confidence interval 

184 that contains the value. For 9 of these stations, the 95% confidence intervals correspond to the 

185 theoretical lower and upper bounds; so that uncertainties are too large to determine the extremal 

186 dependence class. For the x statistic, results are less clear-cut. Figure [5] represents the histograms 

187 for x('^) for successive lu values. Despite only a few observations being close to 1, most of the 

188 stations have a value greater than 0.75. These values can be considered as significantly high 

189 as —1 < x{^) ^ I7 for all uj. Consequently, models of the form ((l])-([3]) may be suited to model the 

190 extremal dependence between successive observations. 
Other methods exist to test the extremal depe ndence but were unconvin cing for our application 



Ledford and Tawn 



2003 



Falk and Michel 



200d ]. Indeed, the approach of 



Falk and Michel 



2006 ] 



does not take into account the dependence between y>-i and K: while the test of Ledford and 



Tawn 



3 



20031 ] appears to be poorly discriminatory for our case study. 
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195 4 Performance of the Markovian Models on Quantile Esti- 

196 mat ion 

197 4.1 Comparison between Markovian estimators 

198 In this section, the performance of six different extremal dependence structures is analyzed on the 

199 50 gaging stations introduced in section [1] These models are: log for the logistic, nlog for the 

200 negative logistic, mix for the mixed models and their relative asymmetric counterparts - e.g. alog, 

201 anlog and amix. To assess the impact of the dependence structure on flood peak estimation, the 

202 efficiency of each model to estimate quantiles with return periods 2, 10, 20, 50 and 100 years is 

203 evaluated. 

204 As practitioners often have to deal with small record lengths in practice, the performance of the 

205 Markovian models is analyzed on all sub time series of length 5, 10, 15 and 20 years. Finally, to 

206 assess the efficiency for all the gaging stations considered in this study, the normalized bias (nbias) , 

207 the variance (var) and the normalized mean squared error (nmse) are computed: 



nHas = (13) 
var = _i_ g j^fe-^ - nfe^a.^ (14) 



i=l 



208 where Qt is the benchmark T-year return level and Qi^r is the j-th estimate of Qt- 

209 [Figure 7 about here.] 

210 Figure [7] depicts the nbias densities for Q20 with a record length of 5 years. It is overwhelming 

211 that the extremal dependence structure has a great impact on the estimation of Q2o- Comparing 

212 the two panels, it can be noticed that the symmetric dependence structures give spreader densities; 
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213 that is, more variable estimates. Independently of the symmetry, Figure [7] shows that the mixed 

214 dependence family is more accurate. 

215 [Table 2 about here.] 

216 Tabic [3] shows the nbias, var and nmse statistics for all the Markovian estimators as the record 

217 length increases for quantile Q^q. This table confirms results derived from Figure [71 Indeed, the 

218 asymmetric dependence structures give less variable and biased estimates - as their nbias and var 

219 statistics are smaller. In addition, whatever the record length is, the Markovian models perform 

220 with the same hierarchy; that is the mix and amix models are by far the most accurate estimators 

221 - i.e. with the smallest nmse values. The same results (not shown) have been found for other 

222 quant iles. 

223 From an hydrological point of view, these two results are not surprising. The symmetric models 

224 suppose that the variables Yt and Yt+i are exchangeable. In our context, exchangeability means 

225 that the time series are reversible - e.g. the time vector direction has no importance. When dealing 

226 with AM or POT and stationary time series, it is a reasonable hypothesis. For example, the MLE 

227 remains the same with any permutations of the AM/POT sample. However, when modeling all 

228 exceedances, the time direction can not be considered as reversible as flood hydrographs are clearly 

229 non symmetric. 

230 [Figure 8 about here.] 



The Pickands' dependence function A{u!) 



Pickandd . 



198l\ is another representation for the 



232 extremal dependence structure for any extreme value distribution. A{uj) is related to the V function 

233 in equation ([3]) as follows: 

M^) = -1 , ^ = — - — (16) 

234 Figure [8] represents the Pickands' dependence function for all the gaging stations and the three 

235 asymmetric Markovian models. One major specificity of the mixed models is that these models can 

236 not account for perfect dependence cases. In particular, the Pickands' dependence functions for 
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237 the mixed models satisfy A{Q.5) > 0.75 while ^1(0.5) G [0.5, 1] for the logistic and negative logistic 

238 models. From Figure[51 it can be seen that only few stations have a dependence function that could 

239 not be modeled by the amix model. Therefore, the dependence range limitation of the amix model 

240 does not seem too restrictive. 

241 In this section, the effect of the extremal dependence structure has been assessed. It has been 

242 established that the symmetric models are hydrologically inconsistent as they could not reproduce 

243 the flood event asymmetry. In addition, for all the quantiles analyzed, the asymmetric mixed model 

244 is the most accurate for flood peak estimations. Therefore, in the remainder of this section, only 

245 the amix model will be compared to conventional POT estimators. 



246 4.2 Comparison between amix and conventional POT estimators 

247 In this section, the performance of the amix estimator is compared to the estimators usually used 

248 in flood frequency analysis. For this purpose, the quantile estimates derived from the Maximum 

249 Likelihood Estimator (ML E), the Unbiased and Biased Probability Weighted moments estimators 



Hosking and Wallis 



19871 1 (PWU and PWB respectively) are considered. 

251 [Figure 9 about here.] 

252 Figure [5] depicts the nbias densities for the amix, MLE, PWU and PWB estimators related 

253 to the Qs, QiQ and Q20 estimations with a record length of 5 years. It can be seen that amix is the 

254 most accurate model for all quantiles. Indeed, the amix nbias densities are the most sharp with a 

255 mode close to 0. Focusing only on "classical" estimators (e.g. MLE, PWU and PWB), there is 

256 no estimator that perform better than any other anytime. These two results advocate the use of 

257 the amix model. 

258 [Table 3 about here.] 

259 Table m shows the performance of each estimator to estimate Q^o as the record length increases. 

260 It can be seen that the amix model performs better than the conventional estimators for the whole 

261 range of record lengths analyzed. First, amix has the same bias than the conventional estimators. 
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262 Thus, the amix dependence structure seems to be suited to estimate flood quantile estimates. 

263 Second, because of its smaller variance, amix is more accurate than AILE, PWU and PWB 

264 estimators. This smaller variance is mainly a result of all of the exceedances (not only cluster 

265 maxima) being used in the inference procedure. Consequently, the amix model has a smaller nmse 

266 - around half of the conventional models ones. 

267 [Figure 10 about here.] 

268 Figure [TOl shows the evolution of the nmse as the return period increases for the amix, MLE, 

269 PWU and PWB models. This figure corroborates the conclusions drawn from Figure[9]and TablelH 

270 It can be seen that the amix model has the smallest nmse, independently of the return period and 

271 the record length. In addition, the amix becomes increasingly more efficient as the return period 

272 increases - mostly for return periods greater than 20 years. While the conventional estimators 

273 present an erratic nmse behavior as the return period increases, the amix model is the only one 

274 that has a smooth evolution. To conclude, these results confirm that the amix model clearly 

275 improves fiood peak quantile estimates - especially for large return periods. 



5 Inference on Other Flood Characteristics 



277 As all exceedances are modeled using a first order Markov chain, it is possible to infer other quan- 

278 titles than fiood peaks - e.g. volume and duration. In this section, the ability of these Markovian 

279 models to reproduce the flood duration is analyzed. For this purpose, the most severe flood hy- 

280 drographs within each year are considered and normalized by their peak values. Consequently, 

281 from this obs erved normalized hydrograph set, two flood ch aracteristics derived from a data set of 



hydrographs 



Robson and Reedl . 



1999 



Sauauet et al 



20081 1 are considered: (a) the duration d„ 



above 0.5 of the normalized hydrograph set mean and (b) the median dmed of the durations above 
0.5 of each normalized hydrograph. 
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285 5.1 Global Performance 

286 [Figure 11 about here.] 

287 Figure 111! plots the flood durations dmean and dmed biases derived from the three asymmetric 

288 Markovian models in function of their empirical estimates. It can be seen that no model leads to 

289 accurate flood duration estimations. In addition, the extremal dependence structure has a clear 

290 impact on these estimations. In particular, the anlog and amix models seem to underestimate 

291 the flood durations, while the alog model leads to overestimations. Consequently, two different 

292 conclusions can be drawn. First, as large durations are poorly estimated, higher order Markov 

293 chains may be of interest. However, this is a considerable task as higher dimensional multivariate 

294 extreme value distributions often lead to numerical problems. Instead of considering higher order, 

295 another alternative may be to change daily observations for d-day observations - where d is larger 

296 than 1. Second, it is overwhelming that the extremal dependence structure affects the flood duration 

297 estimations. As noticed in Section r2.11 there is no finite parametrization for the extremal dependence 

298 structure V - see Equation Consequently, it seems reasonable to suppose that one suited for 

299 flood hydrograph estimation may exist. 

300 [Figure 12 about here.] 

301 Figure fT2] depicts the observed normalized mean hydrographs and the ones predicted by the three 

302 asymmetric Markovian models. For the J0621610 station (left panel), the normalized hydrograph is 

303 well estimated by the three models; whereas for the L0400610 station (right panel), the normalized 

304 hydrograph is poorly predicted. This result confirms the inability of the three Markovian models 

305 to reproduce long flood events with daily data and a first order Markov chain. 

306 [Figure 13 about here.] 

307 Figure 1131 represents the biases related to each value of the normalized mean hydrograph. In 

308 addition, to help estimator comparison, the nmse is reported at the right side. It can be seen 

309 that the alog model dramatically overestimates the hydrograph rising limb while giving reasonable 

310 estimations for the falling phase. The anlog model slightly overestimates the rising part while 
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311 strongly underestimates the falling one. The amix model always leads to underestimations - this 

312 is more pronounced for the falling limb. However, despite these different behaviors, these three 

313 estimators seems to have a similar performance - in terms of nmse. 

314 [Figure 14 about here.] 

315 Figure [14] represents the spatial distribution of the nmse on the normalized mean hydrograph 

316 estimation for each Markovian model. It seems that there is a specific spatial distribution. In 

317 particular, the worst cases are related to the middle part of France. In addition, for different 

318 extremal dependence structures, the best nmse values correspond to different spatial locations. 

319 The alog model is more accurate for the extreme north part of France; the anlog model is more 

320 efficient for the east part of France; while the amix model performs best in the middle part of 

321 France. Consequently, as at a global scale no model is accurate to estimate the normalized mean 

322 hydrograph, it is worth trying to identify which catchment types are related to the best estimations. 

323 For our data set, this is a considerable task. No standard statistical technique lead to reasonable 

324 results. In particular, the principal component analysis, hierarchical classification, sliced inverse 

325 regression lead to no conclusion about which catchment types are more suitable for our models. 

326 Only a regression approach gives some first guidelines. For this purpose, a regression between the 

327 nbias on the dmean estimation for each asymmetric model and some geomorphologic and hydrologic 



the hypsometric curve 



Roche 



1963|, the Base Flow Index (BFI) 



Section 5.3.3] and an index characterizing the rainfall persistence 
considered. 



Tallaksen and Van Lanen 




2004, 


Vaskova and Frances. 


199J 


i\ are 



nbias {dmean, anlog) = 0.89 - 2.19BF/, i?^ = 0.40 (17) 
nbias {dmean, amix) = 0A9-1.74:BFI, i?^ = 0.43 (18) 

332 From equations (|17p and (US]), the BFI variable explains around 40% of the variance. Despite 

333 the fact that a large variance proportion is not taken into account, the BFI is clearly related to the 
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334 dmean estimation performance. These equations indicate that the anlog (resp. amix) model is more 

335 accurate to reproduce the dmean variable for gaging stations with a BFI around 0.4 (resp. 0.28). 

336 These BFI values correspond to catchments with moderate up to flash flow regimes respectively. 

337 These results corroborate the ones derived from Figure [T51 the first order Markovian models with a 

338 1-day lag conditioning are not appropriate for long fiood duration estimations. Consequently, while 

339 no physiographic characteristic is related to the alog performance; it is suggested, for such 1-day 

340 lag conditioning, to use the anlog and amix models for quick basins. 

341 6 Conclusion 

342 Despite that univariate EVT is widely applied in environmental sciences, its multivariate extension 

343 is rarely considered. This work tries to promote the use of the MEVT in hydrology. In this work, the 

344 bivariate case was considered as the dependence between two successive observations was modeled 

345 using a first order Markov chain. This approach has two main advantages for practitioners as: (a) 

346 the number of data to be inferred increases considerably and (b) other features can be estimated - 

347 flood duration, volume. 

348 In this study, a comparison between six different extremal dependence structures (including 

349 both symmetric and asymmetric forms) has been performed. Results show that an asymmetric 

350 dependence structure is more relevant. From a hydrological point of view, this asymmetry is 

351 rational as flood hydrographs are asymmetric. In particular, for our data, the asymmetric mixed 

352 model gives the most accurate flood peak estimations and clearly improves flood peak estimations 

353 compared to conventional estimators independently of the return period considered. 

354 The ability of these Markovian models to estimate the flood duration was carried out. It has 

355 been shown that, at first sight, no dependence structure is able to reproduce the fiood hydrograph 

356 accurately. However, it seems that the anlog and amix models may be more appropriate when 

357 dealing with moderate up to fiash flow regimes. These results depend strongly on the conditioning 

358 term (i.e. Pr[Yt < yt\Yt-s = Vt-s]) of the flrst order Markov chain and on the auto-correlation 

359 within the time series. In our application, (5 = 1 and daily time step was considered. 



16 



More general conclusions can be drawn. The weakness of the proposed models to derive consis- 
tent flood hydrographs may not be related to the daily time step but to the inadequacy between the 
conditioning term and the flood dynamics . To ensure better results, higher order Markov chains 



may be of interest 



Fawcett and Walshaw 



20001 . However, as numerical problems may arise, an- 



364 other alternative may be to still consider a first order chain but to change the "conditioning lag 

365 value" d. In particular, for some basins, it may be more relevant to condition the Markov chain 

366 with a larger but more appropriate lag value. 

367 Another option to improve the proposed models for flood hydrograph estimation is to use a more 

368 suitable dependence function V. As there is no finite parametrization for the extremal dependence 

369 structure, it seems reasonable that one more appropriate for flood hydrographs may exist. In this 

370 work, results show that the anlog model is more able to reproduce the hydrograph rising part, while 

371 the alog is better for the falling phase. Define 



V izi,Z2) ^ aVi izi,Z2) + (3V2 (^1,2:2) 

372 where Vi (resp. V2) is the extremal dependence function for the alog (resp. anlog) model and a and 

373 P are real constants such as a + f3 — I. By definition, y is a new extremal dependence function. 

374 In particular, V may combine the accuracy of the alog and anlog models for both the rising 

375 and falling part of the flood hydrograph. Another alternative may be to look at non-parametric 

376 Pickands' dependence function estimators 



Caperaa et al 



1997l | but that will require techniques to 



simulate Markov chains from these non-parametric e stimations 

All statistical analysis were performed with in the 
In particular, the POT package 



Ribatet 



R Development Core Team 



2007l | framework. 



2007l | integrates the tools that were developed to carry out 

380 the modeling effort presented in this paper. This package is available, free of charge, at the website 

381 http : //www . R-pro j ect . org section CRAN, Packages or at its own webpage http : //pot . r-f orge . r-pro j ect . org/[ 
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Table 1: Partial and mixc;(l partial derivatives, definition domain, total independent and perfect dependent cases for each 
extremal asymmetric dependence function V. 



Model 



Asymmetric Models 



alog 



anlog 



an 



V{x,y) 

Vi{x,y) 

V2{x,y) 

Vi2{x,y) 

A{w) 

Independence 
Total dependence 
Constraint 



1-01 I 1-92 I 

X y 



.141 - ef 



1-02 



^{0102)^ {xyy 



-l/a 
-1/a 
-l/a 
-l/a 



-l/a 
-l/a 
l/a 



a-2 



(1 - (9i) (1 - W) + (1 - 6I2) W + (1-W)a6'f +W^9^ 

a = 1 or 6*1 = or 6»2 = 
a ^ 

< a < 1, < 6ii,6i2 < 1 



X y 



-l/a-l 
-1/a-l 



(^)"" + (f)~"_ 

a — »■ 1 or 6*1 ^ or (?2 — 
a +00 
a > 0, < 6»i,6i2 < 1 



-l/a-2 



X y 



(2a 



2a-\-e I 
(x+J/)^ 

ct+e , 

(x+j/)^ ^ 



60+49 



a = 
Never 
a>Q,a + 2e 
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Figure 1: Histogram of the extremal index estimations from the 100 simulated Markov Chains of 
length 2000. 
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Figure 3: Autocorrelation plot (left panel) and scatterplot of the time series at lag 1 (right panel) 
for the Moselle river at Noirgueux (A4200630). 



26 




27 




Figure 5: Plot of the x s-nd x statistics and the related 95% intervals for the Moselle river at 
Noirgueux (A4200630). The solid blue lines are the theoretical bounds. 
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Figure 6: Histograms of the x{^) statistics for different (j values. Left panel: u) = 0.98, middle 
panel: iv = 0.985 and right panel: w = 0.99. 
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Figure 7: Densities of the normalized biases of Q20 estimates for the symmetric Markovian models 
(left panel) and the asymmetric ones (right panel). Target site record length: 5 years. 
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Figure 8: Representation of the Pickands' dependence functions for the 50 gaging stations. Left 
panel : alog, middle panel: anlog and right panel: amix. "+" represents the theoretical dependence 
bound for the amix model. 
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Figure 9: Densities of the normalized biases for the amix model and the MLE, PWU, and PWB 
estimators for quantiles Q5 (left panel), Qio (middle panel) and Q20 (right panel). Record length: 
5 years. 
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Figure 11: d„iean and dmed normalized biases in function of the theoretical values for the three 
asymmetric Markovian models. 
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Figure 12: Observed and simulated normalized mean hydrographs for the J0621610 (left panel) and 
the L0400610 (right panel) stations. 
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Figure 13: Evolution ol the biases for the normalized mean hydrograph estimations in function of 
the distance of the flood peak time. 
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Figure 14: nmse spatial distribution according for the three Markovian models. Left panel: alog, 
middle panel: anlog and right panel: amix. The radius is proportional to the nmse value. 
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Table 2: x(^) statistics for all stations, w = 0.98, 0.985, 0.99. 



Stations 




a; 


= 0.98 










= 0.985 






= 0.99 






x( 


w) 


95% C.I. 


x( 


UJ) 




95% C.I. 


x(w) 




95% C.I. 


A3472010 





67 


(-0.02, 


1 


00) 





60 


(-0 


02, 


1 


00) 


0.57 


(-0 


01, 


1 


00) 


A4200630 





53 


( 0.21, 





81) 





45 


( 


07, 





77) 


0.38 


(-0 


01, 





76) 


A4250640 





55 


( 0.27, 





82) 





49 


( 


18, 





76) 


0.41 


( 


02, 





71) 


A5431010 





44 


(-0.02, 


1 


00) 





44 


(-0 


02, 


1 


00) 


0.41 


(-0 


01, 


1 


00) 


A5730610 





59 


( 0.25, 





94) 





56 


( 


20, 





90) 


0.50 


( 


07, 





97) 


A6941010 





62 


( 0.22, 





99) 





60 


( 


16, 


1 


00) 


0.56 


( 


06, 


1 


00) 


A6941015 





63 


( 0.29, 





95) 





60 


( 


20, 





96) 


0.58 


( 


17, 





98) 


D0137010 





39 


( 0.04, 





69) 





33 


(-0 


02, 





67) 


0.28 


(-0 


01, 





69) 


D0156510 





59 


( 0.25, 





88) 





55 


( 


20, 





86) 


0.53 


( 


14, 





92) 


E1727510 





62 


( 0.18, 





91) 





59 


( 


16, 





93) 


0.47 


(-0 


01, 





89) 


E1766010 





63 


( 0.23, 





98) 





59 


( 


17, 





96) 


0.54 


( 


09, 





96) 


E3511220 





59 


( 0.10, 


1 


00) 





53 


(-0 


02, 


1 


00) 


0.50 


(-0 


01, 





99) 


E4035710 





77 


( 0.02, 


1 


00) 





68 


(-0 


02, 


1 


00) 


0.60 


(-0 


01, 


1 


00) 


E5400310 





88 


( 0.30, 


1 


00) 





89 


( 


29, 


1 


00) 


0.83 


( 


13, 


1 


00) 


E5505720 





91 


( 0.24, 


1 


00) 





87 


( 


09, 


1 


00) 


0.86 


( 


02, 


1 


00) 


E6470910 





96 


( 0.40, 


1 


00) 





94 


( 


25, 


1 


00) 


0.98 


( 


00, 


1 


00) 


H0400010 





84 


( 0.12, 


1 


00) 





83 


( 


02, 


1 


00) 


0.78 


(-0 


01, 


1 


00) 


H1501010 





82 


( 0.36, 


1 


00) 





90 


( 


39, 


1 


00) 


0.84 


( 


26, 


1 


00) 


H2342010 





68 


( 0.31, 


1 


00) 





67 


( 


25, 


1 


00) 


0.60 


( 


11, 


1 


00) 


H5071010 





75 


( 0.30, 


1 


00) 





76 


( 


22, 


1 


00) 


0.75 


( 


15, 


1 


00) 


H5172010 





80 


( 0.47, 


1 


00) 





77 


( 


42, 


1 


00) 


0.73 


( 


30, 


1 


00) 


H6201010 





69 


( 0.29, 


1 


00) 





69 


( 


14, 


1 


00) 


0.69 


( 


08, 


1 


00) 


H7401010 





85 


( 0.46, 


1 


00) 





85 


( 


38, 


1 


00) 


0.81 


( 


27, 


1 


00) 


19221010 





67 


( 0.23, 


1 


00) 





66 


( 


19, 


1 


00) 


0.59 


( 


04, 


1 


00) 


J0621610 





61 


( 0.25, 





92) 





58 


( 


20, 





94) 


0.51 


( 


08, 





91) 


K0433010 





59 


( 0.22, 





91) 





54 


( 


15, 





89) 


0.45 


( 


00, 





85) 


K0454010 





71 


( 0.37, 


1 


00) 





67 


( 


24, 


1 


00) 


0.65 


( 


14, 


1 


00) 


K0523010 





62 


(-0.02, 


1 


00) 





58 


(-0 


02, 


1 


00) 


0.53 


(-0 


01, 


1 


00) 


K0550010 





61 


( 0.22, 





94) 





57 


( 


15, 





94) 


0.54 


( 


07, 


1 


00) 


K0673310 





67 


( 0.24, 


1 


00) 





65 


( 


18, 


1 


00) 


0.66 


( 


07, 


1 


00) 


K0910010 





65 


(-0.02, 


1 


00) 





61 


(-0 


02, 


1 


00) 


0.58 


(-0 


01, 


1 


00) 


K1391810 





68 


( 0.27, 


1 


00) 





64 


( 


16, 





98) 


0.60 


( 


06, 





96) 


K1503010 





69 


( 0.38, 





98) 





67 


( 


30, 





98) 


0.64 


( 


23, 


1 


00) 


K2330810 





68 


( 0.29, 


1 


00) 





66 


( 


22, 


1 


00) 


0.62 


( 


09, 


1 


00) 


K2363010 





65 


( 0.26, 





98) 





66 


( 


16, 


1 


00) 


0.61 


( 


01, 


1 


00) 


K2514010 





61 


( 0.24, 


1 


00) 





61 


( 


21, 


1 


00) 


0.58 


( 


12, 


1 


00) 


K2523010 





53 


(-0.02, 


1 


00) 





53 


(-0 


02, 


1 


00) 


0.51 


(-0 


01, 


1 


00) 


K2654010 





68 


( 0.37, 


1 


00) 





68 


( 


31, 


1 


00) 


0.60 


( 


10, 


1 


00) 


K2674010 





60 


( 0.25, 





89) 





58 


( 


22, 





94) 


0.54 


( 


08, 





95) 


K2871910 





62 


( 0.26, 





95) 





57 


( 


15, 





94) 


0.56 


( 


10, 





97) 


K/(5(54UiU 


u 


ol 


( 0.25, 


1 


00) 


u 


57 


( 


17, 





97) 


o.oy 


( 


16, 


1 


00) 


K3222010 





56 


( 0.21, 





90) 





53 


( 


18, 





93) 


0.46 


( 


11, 





89) 


K3292020 





59 


( 0.27, 





91) 





57 


( 


17, 





91) 


0.48 


( 


07, 





90) 


K4470010 





76 


( 0.39, 


1 


00) 





77 


( 


40, 


1 


00) 


0.73 


( 


27, 


1 


00) 


K5090910 





64 


( 0.27, 





93) 





§^ 


( 


26, 





96) 


0.58 


( 


12, 





98) 


K5183010 





57 


( 0.14, 





91) 





56 


( 


15, 





96) 


0.53 


( 


06, 





97) 


K5200910 





63 


( 0.24, 





93) 





62 


( 


20, 





95) 


0.56 


( 


11, 





97) 


L0140610 





73 


( 0.23, 


1 


00) 





66 


( 


15, 


1 


00) 


0.58 


(-0 


01, 


1 


00) 


L0231510 





59 


( 0.16, 





91) 





55 


( 


11, 





92) 


0.53 


(-0 


01, 





92) 


L0400610 





74 


(-0.02, 


1 


00) 





65 


(-0 


02, 


1 


00) 


0.61 


(-0 


01, 


1 


00) 



Table 3: Several characteristics of the Markovian estimators on Qso estimation as the record length 
increases. Standard errors are reported in brackets. 



Model 




5 years 






10 years 






15 years 






20 yea 


nbias 


var 


nrnse 


nbias 


var 


nmse 


nbias 


var 


nmse 


nbias 


var 


log 


-0.35 


0.54 


0.66 


-0.32 


0.32 


0.42 


-0.30 


0.23 


0.32 


-0.28 


0.17 




(16c-3) 


(22c-3) 


(18c-3) 


(12e-3) 


(12e-3) 


(14e-3) 


(lle-3) 


(9e-3) 


(12c-3) 


(9e-3) 


(7e-3) 


nlog 


-0.21 


0.20 


0.24 


-0.20 


0.11 


0.15 


-0.18 


0.08 


0.12 


-0.18 


0.06 




(lOe-3) 


(7e-3) 


(lle-3) 


(7e-3) 


(4e-3) 


(9e-3) 


(6e-3) 


(3e-3) 


(8e-3) 


(5e-3) 


(2e-3) 


mix 


-0.08 


0.14 


0.14 


-0.07 


0.08 


0.08 


-0.06 


0.05 


0.06 


-0.05 


0.04 




(8e-3) 


(5e-3) 


(8e-3) 


(6e-3) 


(2e-3) 


(6e-3) 


(5e-3) 


(2e-3) 


(5e-3) 


(4e-3) 


(le-3) 


alog 


-0.15 


0.39 


0.41 


-0.13 


0.22 


0.24 


-0.11 


0.16 


0.17 


-0.10 


0.12 




(14e-3) 


(15e-3) 


(14c-3) 


(lOc-3) 


(9c-3) 


(llc-3) 


(9c-3) 


(6c-3) 


(9c-3) 


(8c-3) 


(4c-3) 


anlog 


-0.10 


0.20 


0.21 


-0.09 


0.11 


0.12 


-0.08 


0.08 


0.09 


-0.08 


0.06 




(lOe-3) 


(7e-3) 


(lOe-3) 


(7e-3) 


(4e-3) 


(8e-3) 


(6e-3) 


(2e-3) 


(6e-3) 


(5e-3) 


(2e-3) 


amix 


-0.06 


0.11 


0.12 


-0.05 


0.06 


0.06 


-0.04 


0.04 


0.05 


-0.03 


0.03 




(7e-3) 


(4e-3) 


(7e-3) 


(6e-3) 


(2e-3) 


(6e-3) 


(5e-3) 


(le-3) 


(5e-3) 


(4e-3) 


(le-3) 
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Table 4: Several characteristics of the amix, MLE, PWU and PWB estimators for Q^q estimation 



as the record length increases. Standard errors are reported in brackets. 



Model 




5 years 






10 years 






15 years 






20 years 




nbias 


var 


nmse 


nbias 


var 


nmse 


nbias 


var 


nmse 


nbias 


var 


nn 


amix 


-0.06 


0.11 


0.12 


-0.05 


0.06 


0.07 


-0.04 


0.04 


0.05 


-0.04 


0.03 


0.1 




(8e-3) 


(4e-3) 


(8e-3) 


(6e-3) 


(2e-3) 


(6e-3) 


(5e-3) 


(le-3) 


(5e-3) 


(4e-3) 


(le-3) 


(4e 


MLE 


-0.13 


0.25 


0.27 


-0.14 


0.13 


0.14 


-0.13 


0.08 


0.10 


-0.11 


0.05 


0.1 




(12e-3) 


(15e-3) 


(12c-3) 


(8e-3) 


(6e-3) 


(9e-3) 


(7c-3) 


(3e-3) 


(7e-3) 


(5e-3) 


(2e-3) 


(6e 


PWU 


0.08 


0.30 


0.31 


-0.01 


0.15 


0.15 


-0.03 


0.10 


0.10 


-0.03 


0.06 


0.1 




(13e-3) 


(13e-3) 


(13c-3) 


(9e-3) 


(6c-3) 


(9e-3) 


(7c-3) 


(3c-3) 


(7e-3) 


(6c-3) 


(2c-3) 


(6e 


PWB 


-0.07 


0.20 


0.21 


-0.10 


0.11 


0.12 


-0.11 


0.08 


0.09 


-0.10 


0.05 


0.1 




(lOe-3) 


(8e-3) 


(lle-3) 


(7e-3) 


(4e-3) 


(8e-3) 


(6e-3) 


(2e-3) 


(7e-3) 


(5e-3) 


(le-3) 


(6e 
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Table 5: Partial and mixed partial derivatives, definition domain, total independent and perfect 
dependent cases for each extremal symmetric dependence function V. 



Model 










Symmetric Models 








log 




nlog 


mia; 


V{x,y) 




{x- 




a 


111 / a 1 a^~l/cK 

J + ^ ~ (a; + y ) 


1 _|_ 1 _ g 

a; 1/ x+y 


Vi{x,y) 




-x~ 


-i-W{x,y)^ 


-1 




1 1 a 
^ (x+y)-^ 


V2{x,y) 




-y~ 




-1 




1 1 a 
¥ ^ {x+y)-^ 


Vi2{x,y) 




i-il_^V{x,y)^ 


-(a + (x" + 


2a 

(x+yf 


A{w) 






— W)" +10" 




1 - [(1 - w)-" 


1 — w {1 — w)a 


Independence 






a = 1 




a 


a = 


Total dependence 










a — »■ +0O 


Never reached 


Constraint 






< a < 1 




a > 


< a < 1 
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